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Abstract. Approximating adequate number of clusters in multidimen- 
sional data is an open area of research, given a level of compromise made 

T — | on the quality of acceptable results. The manuscript addresses the is- 

sue by formulating a transductive inductive learning algorithm which 
uses multivariate Chebyshev inequality. Considering clustering problem 
(— i in imaging, theoretical proofs for a particular level of compromise are 

derived to show the convergence of the reconstruction error to a finite 
5 value with increasing (a) number of unseen examples and (b) the num- 

ber of clusters, respectively. Upper bounds for these error rates are also 
proved. Non-parametric estimates of these error from a random sample 
of sequences empirically point to a stable number of clusters. Lastly, the 

i 1 generalization of algorithm can be applied to multidimensional data sets 

from different fields. 

u 
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1 Introduction 

> 

in 

The estimation of clusters has been approached either via a batch framework 
f — where the entire data set is presented and different initializations of seed points 

or prototypes tested to find a model of cluster that fits the data like in fc-means 
t-H [S] and fuzzy C-means [5] or an online strategy clusters are approximated as 

new examples of data are presented one at a time using variational Dirichlet 
processes [7] and incremental clustering based on randomized algorithms [3]. 
' . \ It is widely known that approximation of adequate number of clusters using a 

multidimensional data set is a open problem and a variety of solutions have been 
proposed using Monte Carlo studies [5], Bayesian-Kullback learning scheme in 
mean squared error setting or gaussian mixture [H] , model based approaches [3] 
03 and information theory [17], to cite a few. 

This work deviates from the general strategy of defining the number of clus- 
ters apriori. It defines a level of compromise, tolerance or confidence in the 
quality of clustering which gives an upper bound on the number of clusters gen- 
erated. Note that this is not at all similar to defining the number of clusters. It 
only indicates the level of confidence in the result and the requirement still is to 
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estimate the adequate number of clusters, which may be way below the bound. 
The current work focuses on dealing with the issue of approximating the number 
of clusters in an online paradigm when the confidence level has been specified. 
In certain aspects it finds similarity with the recent work on conformal learning 
theory [18] and presents a novel way of finding the approximation of cluster with 
a degree of confidence. 

Conformal learning theory |15j . which has its foundations in employing a 
transductive-inductive paradigm deals with the idea of estimating the quality of 
predictions made on the unlabeled example based on the already processed data. 
Mathematically, given a set of already processed examples (xi,yi), (£2,2/2), 
(xi—i, j/i-i), the conformal predictors give a point prediction y for the unseen 
example Xi with a confidence level of r £ . Thus it estimates the confidence in 
the quality of prediction using the original label yi after the prediction has been 
made and before moving on to the next unlabeled example. These predictions are 
made on the basis of a non-conformity measure which checks how much the new 
example is different from a bag of already seen examples. A bag is considered 
to be a finite sequence Z (zi, Z2, of examples, where Zj = (xi, yi). Then 

using the idea of exchangeability, it is known from |18) , under a weaker assump- 
tion that for every positive integer i, every permutation n of {1,2, ...,£}, and 
every measurable set E C Z % , the probability distribution P{{z\,z 2l ...) € Z°° 
: (z 1 ,z 2 ,...,z l ) e E} = P{(z 1 ,z 2 ,...) € Z°° : (^(x), z^ 2 ), ^(i)) € E}. A pre- 
diction for the new example Xi is made if and only if the frequency (p- value) of 
exchanging the new example with another example in the bag is above certain 
value. 

This manuscript finds its motivation from the foregoing theory of online 
prediction using transductive-inductive paradigm. The research work applies the 
concept of coupling the creation of new clusters via transduction and aggregation 
of examples into these clusters via induction. It finds its similarity with |15| in 
utilizing the idea of prediction region defined by a certain level of confidence. It 
presents a simple algorithm that differs significantly from conformal learning in 
the following aspect: (1) Instead of working with sequences of data that contain 
labels, it works on unlabeled sequences. (2) Due the first formulation, it becomes 
imperative to estimate the number of clusters which is not known apriori and 
the proposed algorithm comes to rescue by employing a Chebyshev inequality. 
The inequality helps in providing an upper bound on the number of clusters 
that could be generated on a random sample of sequence. (3) The quality of 
the prediction in conformal learning is checked based on the p-values generated 
online. The current algorithms relaxes this restriction in checking the quality 
online and just estimates the clusters as the data is presented. (4) The foregoing 
step makes the algorithm a weak learner as it is sequence dependent. To take 
stock of the problem, a global solution to the adequate number of cluster is 
approximated by estimating kernel density estimates on a sample of random 
sequences of a data. Finally, the level of compromise captured by a parameter in 
the inequality gives an upper bound on the number of clusters generated. In case 
of clustering in static images, for a particular parameter value, theoretical proofs 
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show that the reconstruction error converges to a finite value with increasing (a) 
number of unseen examples and (b) the number of clusters. Empirical kernel 
density estimates of reconstruction error over a random sample of sequences 
on toy examples indicate the number of clusters that have high probability of 
low reconstruction error. It is not necessary that labeled data are always present 
to compute the reconstruction error. In that case the proposed algorithm stops 
short at density estimation of approximated number of clusters from a random 
sequence of examples, with a certain degree of confidence. 

Another dimension of the proposed work is to use the generalization of mul- 
tivariate formulation of Chebyshev inequality p], [10], [TT]. It is known that 
Chebyshev inequality helps in proving the convergence of random sequences of 
different data. Also the multivariate formulation of the Chebyshev inequality fa- 
cilitates in providing bounds for multidimensional data which is often afflicted by 
the curse of dimensionality making it difficult to compute multivariate probabil- 
ities. One of the generalizations that exist for multivariate Chebyshev inequality 
is the consideration of probability content of a multivariate normal random vec- 
tor to lie in an Euclidean n-dimensional ball |14j , |13) . This work employs a more 
conservative approach is the employment of the Euclidian n-dimensional ellip- 
soid which restricts the spread of the probability content [3]. Work by [3] and 
|16j provide motivation in employment of multivariate Chebyshev inequality. 

Efficient implementation and analysis of fc-means clustering using the mul- 
tivariate Chebyshev inequality has been shown in [9]. The current work differs 
from fc-means in (1) providing an online setting to the problem of clustering (2) 
estimating the number of clusters for a particular sequence representation of the 
same data via convergence through ellipsoidal multivariate Chebyshev inequal- 
ity, given the level of confidence, compromise or tolerance in the quality of results 
(3) generating global approximations of number of clusters from non-parametric 
estimates of reconstruction error rates for sample of random sequences represent- 
ing the same data and (4) not fixing the cluster number apriori. It must be noted 
that in fc-means, the solutions may be different for different initializations for a 
particular value of fc but the value of fc as such remains fixed. In the proposed 
work, with a high probability, an estimate is made regarding the minimum num- 
ber of clusters that can represent the data with low reconstruction error. This 
outlook broadens the perspective of finding multiple solutions which are upper 
bounded as well as approximating a particular number of cluster which have sim- 
ilar solutions. This similarity in solutions for a particular number of cluster is 
attributed to the constraint imposed by the Chebyshev inequality. A key point 
to be noted is that using increasing levels of compromise or confidence as in 
conformal learners, the proposed work generates a nested set of solutions. Low 
confidence or compromise levels generate tight solutions and vice versa. Thus 
the proposed weak learner provides a set of solutions which are robust over a 
sample. 

This manuscript also extends the work of [16 on employment of multivariate 
Chebyshev inequality for image representation. Work in [16 presents a hybrid 
model based on Hilbert space filling curve [5] to traverse through the image. 
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Since this curve preserve the local information in the neighbourhood of a pixel, 
it reduces the burden of good image representation in lower dimensions. On the 
other side, it acts as a constraint on processing the image in a particular fashion. 
The current work removes this restriction of processing images via the space 
filling curves by considering any pixel sequence that represents the image under 
consideration. Again, a single sequence may not be adequate enough for the 
learner to synthesize the image to a recongnizable level. This can be attributed 
to the fact that in an unsupervised paradigm the number of clusters are not 
known apriori and also the learner would be sequence dependent. To reiterate, 
the proposed work addresses the issues of • recognizability, by defining a level of 
compromise that a user is willing to make via the Chebyshev parameter C p and 
• sequence specific solution, by taking random samples of pixel sequences from 
the same image. The latter helps in estimating a population dependent solution 
which would be robust and stable synthesis. Regularization of these error over 
approximated number of clusters for different levels of compromise leads to an 
adequate number of clusters that synthesize the image with minimal deviation 
from the original image. 

Thus the current work provides a new perspective in approximation of clus- 
ter number at a particular confidence level. To test the propositions made, the 
problem of clustering in images is taken into account. Generalizations of the al- 
gorithm can be made and applied to different fields involving multidimensional 
data sets in an online setting. Let I be an RGB image. A pixel in X is an ex- 
ample Xi with M dimensions (here Af — 3). It is assumed that examples appear 
randomly without repetition for the proposed unsupervised learner. Note that 
when the sample space (here the image T) has finite number of examples in it 
(here M. pixels), then the total number of unique sequences is M.I. When A4 
is large, Ml — > oo. Currently, the algorithm works on a subset of unique se- 
quences sampled from Ml sequences. The probability of a sequence to occur is 
equally likely (in this case 1/M). RGB images from the Berkeley Segmentation 
Benchmark (BSB) |12j have been taken into consideration for the current study. 

2 Transductive-Inductive Learning Algorithm 

Given that the examples (zi = X,-) in a sequence appear randomly, the challenge 
is to (1) learn the association of a particular example to existing clusters or (2) 
create a new cluster, based on the information provided by already processed 
examples. The current algorithm handles the two issues via (1) evaluation of a 
nonconformity measure defined by multivariate Chebyshev's inequality formula- 
tion and (2) 1-Nearest Neighbour (NN) transductive learning, respectively. The 
multivariate formulation of the generalized Chebyshev inequality [4] is applied 
to a new example using a single Chebyshev parameter. This inequality tests the 
deviation of the new example from the mean of a cluster of examples and gives a 
lower probabilistic bound on whether the example belongs to the cluster under 
investigation. If the new random example passes the test, then it is associated 
with the cluster and the mean and covariance matrix for the cluster is recom- 
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Algorithm 1 Unsupervised Learner 



1: procedure Unsupervised LEARNER(imjf, C p ) 

2: [nrows, ncols] <s— size(im<7) 

3: M 4— nrows x ncols > Total no. of unseen examples 

4: pt C ntr > Number of examples encountered 

5: pUdx <— {1, 2, M} > Total no. of indicies of unseen examples 
Initialize Variables 

6: cluster cntr <— > Number of clusters 

7: CumErr va i > Cummulative value 

8: Err-i 4— [] > Error rate as no. of examples increase 

9: Err 2 <— [] t> Error rate as no. of clusters increase 

10: while Caxd(ptid x ) examples remain unprocessed do t> pUd x C {1,2, M} 

11: Choose a random example Xi s.t. i € ptidx 

12: pt cn tr ptcntr + 1 

13: Update pt idx i.e pt»d x «- piidz — {i} 

14: CRITERION 4- [] 

15: -Errwi «- 

16: Vg clusters were g £ {0, 1, 2, cluster cn t r } 

17: Err val 4- ^2 k=1 (xk — E q (x))' 2 > a; means all examples in cluster </ 

18: CumErr va i <— CumErr va i + Err va i 

19: Compute 23 <- (a* - E 9 (x)) T X'^ 1 (a; i - E 9 (x)) 

20: If T> q < C p t> C p is Chebyshev parameter 

21: CRITERION 4- [CRITERION; V q , q] 

22: If more than one cluster that associates to i.e length(CRITERION) > 1 

23: Associate Xi to selected cluster q with minimum T> q 

24: Brn «- [Ern, Err va i/pt cn tr] 

25: If is not associated with any cluster, i.e sum(FOUND) == 

26: Err 2 4- [Err 2 , Err va i/pt cntr ] 

27: cluster cnt r cluster cntr + 1 

28: Using 1-NN find % closest to Xi s.t. j € 

29: Update pt idx i.e pt idx 4- pt idx - {j} 

30: Form a new cluster {xi, Xj} 

31: end while 

32: end procedure 



puted. In case there exists more than one cluster which qualify for association, 
then the cluster with lowest deviation to the new example is picked up for asso- 
ciation. It is also possible to assign the new example to a random chosen cluster 
from the selected clusters to induce noise and then check for the approximations 
on the number of cluster. This has not been considered in the current work for 
the time being. In case of failure to find any association, the algorithm employs 
1-NN transductive algorithm to find a closest neighbour of the current example 
under processing. This neighbour together with the current example forms a new 
cluster. 

Several important implications arise due to the usage of a probabilistic in- 
equality measure as a nonconformal measure. These will be elucidated in detail 
in the later sections. An important point to consider here is the usage of 1-NN 
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algorithm to create a new cluster. Even though it is known that 1-NN suffers 
from the problem of the curse of dimensionality, for problems with small dimen- 
sions, it can be employed for transductive learning. The aim of the proposed 
work is not to address the curse of dimensionality issue. Also, note that in the 
general supervised conformal learning algorithm, a prediction has to be made 
before the next random example is processed. This is not the case in the cur- 
rent unsupervised framework of the conformal learning algorithm. In case the 
current random example fails to associate with any of the existing clusters, un- 
der the constraint yielded by the Chebyshev parameter, the NN helps in finding 
the closest example (in feature space) from the remaining unprocessed sample 
data set, to form a new cluster. Thus the formation of a new cluster depends on 
the strictness of the Chebyshev parameter C p . The procedure for unsupervised 
conformal learning is presented in algorithm [l] It does not strictly follow the 
idea of finding confidence on the prediction as labels are not present to be tested 
against. The goal here is to reconstruct the clusters from a single pixel sequence 
such that they represent the image. The quality of the reconstruction is taken 
up later on when a random sample of pixel sequences are used to estimate the 
probability density of the reconstruction error rates. Note that in the algorithm, 
E q (x) represents the mean of the examples x in the q th cluster and S p is the 
covariance matrix of 7VD feature examples of the q th cluster. 

3 Theoretical Perspective 

Application of the multivariate Chebyshev inequality that yields a probabilistic 
bound enforces certain important implications with regard to the clusters that 
are generated. For the purpose of elucidation of the algorithm, the starfish image 
is taken from [12] . 

3.1 Multivariate Chebyshev Inequality 

Let X be a stochastic variable in Af dimensions with a mean -ELY]. Further, 
£ be the covariance matrix of all observations, each containing J\f features and 
C p G 1Z, then the multivariate Chebyshev Inequality in ^ states that: 

V{(X E[X]) T E-\X E[X]) >C p }<f 
V{(X E[X]) T E-\X E[X\) < C p } > 1 - ~ 

Up 

(i) 

i.e. the probability of the spread of the value of X around the sample mean 
E[X] being greater than C p , is less than Af/C p . There is a minor variation for 
the univariate case stating that the probability of the spread of the value of 
x around the mean \i being greater than C p a is less than 1/Cp. Apart from 
the minor difference, both formulations convey the same message about the 
probabilistic bound imposed when a random vector or number X lies outside 
the mean of the sample by a value of C p . 
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Fig. 1. A random sequence of Starfish Image segmented via unsupervised conformal 
learning algorithm. (C p , NoClust, TRErr) represent the tuple containing the Cheby- 
shev paramenter (Cp), number of clusters generated (NoClust) while using C p and 
the total reconstruction error of the generated image from the original image (TR- 
Err) (a) (3,1034,17.746), (b) (5,271,36.32), (c) (7,159,54.71), (d) (9,45,40.591), (e) 
(11,31,62.606), (f) (13,29,66.061), (g) (15,33,65.424), (h) (17,24,64.98). 



3.2 Association to Clusters 

Once a cluster is initialized (say with Xi and Xj), the size of the cluster depends 
on the number of examples getting associated with it. The multivariate formalism 
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of the Chebyshev inequality controls the degree of uniformity of feature values 
of examples that constitute the cluster. The association of the example to a 
cluster happens as follows: Let the new random example (say x t ) be considered 
for checking the association to a cluster. If the spread of example Xt from ~E q (x) 
(the mean of the q th cluster {xi,Xj}), factored by the covariance matrix S q , 
is below C p , then xt is considered as a part of the cluster. Using Chebyshev 
inequality, it boils down to: 

V{{x t -'E q [x l ,x J }) T S- 1 (x t ~E q [x l ,x ] }) >C P } < ^ 

V{{x t -Eqixt^x^fS^ixt -E q [xi,Xj]) < C p } > 1 - — 

Up 

(2) 

Satisfaction of this criterion suggests a possible cluster to which x t could be 
associated. This test is conducted for all the existing clusters. If there are more 
than one cluster to which xt can be associated, then the cluster which shows 
the minimum deviation from the new random point is chosen. Once the cluster 
is chosen, its size is extended to by one more example i.e. x t . The cluster now 
constitutes {xf, Xj, x t }- If no association is found at all, a new cluster is initialized 
and the process repeats until all unseen examples have been processed. The 
satisfaction of the inequality gives a lower probabilistic bound on size of cluster 
by a value of 1 — (Af/C p ), if the second version of the Chebyshev formula is under 
consideration. Thus the size of the clusters grow under a probabilistic constraint 
in a homogeneous manner. For a highly inhomogeneous image, a cluster size 
may be very restricted or small due to big deviation of pixel intensities from the 
cluster it is being tested with. 

Once the pixels have been assigned to respective decompositions, all pixels in 
a single decomposition are assigned the average value of intensities of pixels that 
constitute the decomposition. Thus is done under the assumption that decom- 
posed clusters will be homogeneous in nature with the degree of homogeneity 
controlled by C p . Figure [T] shows the results of clustering for varying values of 
C p for the starfish image from [T2] . 

3.3 Implications 

In [?] various implications have been proposed for using multivariate Chebyshev 
inequality for image representation using space filling curve. In order to extend 
on the their work, a few implications are reiterated for further development. 
The inequality being a criterion, the probability associated with the same gives 
a belief based bound on the satisfaction of the criterion. In order to proceed, 
first a definition of Decomposition is needed. 

Definition 1. LetT> be a decomposition which contains a set of points x with 
a mean o/E 9 (x). The set expands by testing a new point xt via the Chebyshev 
inequality V{(x t - E ? (x)) T S' 1 (x t - E g (x)) < C p } > 1 - g- . 
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The decomposition may include the point Xt depending on the outcome of the 
criterion. A point to be noted is that, if the new point Xt belongs to V, then D 
can be represented as (x t — E q (x)) T U" 1 (x t — E g (x)). 

Lemma 1. Decompositions T> are bounded by lower probability bound of 1 — 
(Af/C p ). 

Lemma 2. The value ofC p reduces the size of the sample from M. to an upper 
bound of M/Cp probabilistically with a lower bound of 1 — (J\f/C p ). Here M. is 
the number of examples in the image. 

Lemma 3. AsC p — > N the lower probability bound drops to zero, implying large 
number of small decompositions T> can be achieved. Vice versa for C p — > oo . 

It was stated that the image can be reconstructed from pixel sequences at a 
certain level of compromise. From lemma [2j it can be seen that C p reduces the 
sample size while inducing a certain amount of error due to loss of information 
via averaging. This reduction in sample size indicates the level of compromise 
at which the image is to processed. This reduction in sample size or level of 
compromise is directly related to the construction of probabilistically bounded 
decompositions also. Since the decompositions are generated via the usage of 
C p in equation [TJ the belief of their existence in terms of a lower probability 
bound (from lemma [T]) suggests a confidence in the amount of error incurred 
in reconstruction of the image. For a particular pixel, this reconstruction error 
can be computed by squaring the difference between the value of the intensity 
in the original image and the intensity value assigned after clustering. Since 
a somewhat homogeneous decomposition is bounded probabilistically, the re- 
construction error of pixels that constitute it are also bounded probabilistically. 
Thus for all decompositions, the summation of reconstruction errors for all pixels 
is bounded. The bound indicates the confidence in the generated reconstruction 
error. Also, by lemma[2] since the number of decompositions or clusters is upper 
bounded, the total reconstruction error is also upper bounded. It now remains 
to be proven that for a particular level of compromise, the error rates converge 
as the number of processed examples and the number of clusters increase. 

In algorithm [T] three error rates are computed as the random sequence of ex- 
amples get processed. For each original pixel Xi £ VJ^ in the image, let xf- be the 
intensity value assigned after clustering. Then the reconstruction error for pixel 
Xi is norm-2 \ \xi — xf~\\2- Since a pixel is assigned to a particular decomposition 
V qi it gets a value of the mean of the all pixels that constitute the decomposition 
T> q . Thus the reconstruction error for a pixel turns out to be \\xi — E 9 (x)| I2- For 
each cluster q, the reconstruction error is Errx> = Y]j—y — E g (a;) 1 1 2 ■ Note 
that the error also indicates how much the examples deviate from the mean of 
their respective cluster. As new examples are processed based on the information 
present from the previous examples, the total error computed at after processing 
the first pt cntr examples in a random sequence is Err va i = ^* sterm ''' Err-p q . 
The error rate for these pt cn t r examples is Err\ = Err va i/pt cn tr- Finally, an 
error rate is computed that captures how the deviation of the examples from 
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Fig. 2. Error rate Err\ for a particular sequence with increasing number of examples 
with C p — 7. 



their respective cluster means happen, after the formation of a new cluster. This 
error is denoted by Erri. The formula for Err2 is the same as Err% but with 
a minute change in conception. The Err va i are divided by the total number of 
point processed after the formation of every new cluster. 

Theorem 1. Let Zi be a random sequence that represents the entire image T. 
If Zi is decomposed into clusters via the Chebyshev Inequality using the unsuper- 
vised learner, then the reconstruction error rate Err\ converges asymptotically 
with a probabilistically lower bound or confidence level of 1 — M jC v or greater. 

Proof. It is known that the total reconstruction error after pt cntr examples have 
been processed, is Err va i = ^* sfe,, ™ tr Err-p . And the error rate is Err\ = 
Err va i/ pt cntr . It is also known from equation [ij'that an example is associated to 
a particular decomposition T> q if it satisfies the constraint (xt — E g (a;)) T S' 1 
(xt — E g (x)) < C p . Since C p defines level of compromise on the image via lemma[2] 
and the decompositions T> q is almost homogeneous, all examples that constitute 
a decomposition have similar attribute values. Due to this similarity between 
the attribute values, the non-diagonal elements of the covariance matrix in the 
inequality above approach to zero. Thus, S q 1 s» deb\£~ |I, were I is the identity 
matrix. The inequality then equates to: 

{x t --E q (x)) T det\S q - 1 \I{x t 
det\£- l \{x t - E q (x)) T I{x t 

(x t --E q (x)) T I(x t 



E q (x))<C p 
E q {x))<C p 



E q (x)) 



< 



det\S q 1 \ 
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Thus, if Xi — xt was the last example to be associated to a decomposition, the 

reconstruction error 1 1 rc^ — E g (x)|| for that example would be upper bounded be 

c 

- t |^-i| ■ Consequently, the total error after processing pt cntr examples is also 
upper bounded, i.e. 

cluster cn tr 

Err val = ^ Err Vq 
9=1 

cluster cn tr n 

5=1 f=l 

cluster cn tr u ^ 
< V " \ lp 

clustercntr 71 q 

Thus the error rate Err\ — Err va i/pt cntr is also upper bounded. Different de- 
compositions may have different S~ x , but in the worst case scenario, if the 
decomposition with the lowest covariance is substituted for every other decompo- 
sitions, then the upper bound on the error is , Cp . £dust cntr ™ h\ 

(let | ow e s -f-\ X cntr 

which equates to , , ^ r . □ 

det\E lowest \ 

It is important to note that this error rate converges to a finite value asymptot- 
ically as the number of processed examples increases. This is because initially 
when the learner has not seen enough examples to learn and solidify the knowl- 
edge in terms of a stable mean and variance of decompositions, the error rate 
Err\ increases as new examples are presented. This is attributed to the fact 
that new clusters are formed more often in the intial stages, due to lack of prior 
knowledge. After a certain time, when large number of examples have been en- 
countered to help solidify the knowledge or stabilize the decompositions, then 
addition of further examples does not increment the error. This stability of clus- 
ters is checked via the multivariate formulation of the Chebyshev inequality in 
equation [2j The stability also casues the error rate Err\ to stabilize and thus in- 
dicate its convergence in a bounded manner with a probabilistic confidence level. 
Thus for any value oipt cn t r , there exists an upper bound on reconstruction error, 
which stabilizes as pt cn t r increases. 

For C p = 7, the image (c) in figure [I] shows the clustered image that is gener- 
ated using the unsupervised conformal learning algorithm. Pixels in a cluster of 
the generated image have the mean of the cluster as their intensity value or the 
label. This holds for all the clusters in the generated image. The total number 
of clusters generated for a particular random sequence was 159. The error rate 
Err i is depicted in figure [2j 
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Fig. 3. Error rate Err2 for a particular sequence with increasing number of clusters 
and C p = 7. 



Theorem 2. Let Zi be a random sequence that represents the entire image T. 
If Zi is decomposed into clusters via the Chebyshev Inequality using the unsuper- 
vised learner, then the reconstruction error rate Err% converges asymptotically 
with a probabilistically lower bound or confidence level of 1 — M jC v or greater. 

Proof. The error rate Err^ is the computation of error after each new cluster is 
formed. The upper bound on Erri as the number of clusters or decompositions 
increase follows a proof similar to one presented in theorem [TJ. □ 

Again for the same C p = 7, the image (c) in figure [I] the error rate Erri is 
depicted in figure [3] Intuitively, it can be seen that both the reconstruction error 
rates converge to an approximately similar value. 

The theoretical proofs and the lemmas suggest that, for a given level of 
compromise C p there exists an upper bound on the reconstruction error as well as 
the number of clusters. But this reconstruction error and the number of clusters 
is dependent on a pixel sequence presented to the learner. Does this mean that 
for a particular level of compromise one may find values of reconstruction error 
and number of clusters that may never converge to a finite value, when a random 
sample of pixel sequences that represent an image are processed by the learner? 
Or in a more simplified way, is it possible to find a reconstruction error and the 
number of clusters at a particular level of compromise that best represents the 
image? This points to the problem of whether an image can be reconstructed at 
a particular level of compromise where there is a high probability of finding a 
low reconstruction error and the number of clusters, from a sample of sequences. 
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Fig. 4. The probability density estimates for (a) Err\ (b) Erri and (c) the number of 
clusters obtained via the unsupervised conformal learner, generated over 1000 random 
sequences representing the same image with C p = 10. 



The existence of such a probability value would require the knowledge of the 
probability distribution of the reconstruction error over increasing (1) number 
of examples and (2) number of clusters generated. In this work, kernel density 
estimation (KDE) is used to estimate the probability distribution of the recon- 
struction error Err\ and Err 2- To investigate into the quality of the solution 
obtained, the error rates were generated for different random sequences and a 
KDE was evaluated on the observations. The density estimate empirically point 
to the least error rates with high probability. It was found that the error rates 
Err i, Err-i and the number of clusters, all converge to a particular value, for a 
given image. 
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Fig. 5. Behaviour of reconstruction error (via KDE) and number of cluster or decom- 
positions (via KDE) based on increasing values of C p . 



For C p = 10, the probability density estimates were generated using the den- 
sity estimates on error rates and the number of clusters obtained on 1000 random 
sequences of the same image. It was found that the error rates Err\, Err-i and 
the number of clusters converge to 33.1762, 35.9339 and 38, respectively. Figure 
4 shows the graphs for the same. It can be seen from graphs (a) and (b) in figure 
4 that both Err\ and Erri converge nearly to the similar values. 

It can been noted that with increasing value of the parameter C p , the bound 
on the decomposition expands which further leads to generation of lower number 
of clusters required to reconstruct the image. Thus it can be expected that at 
lower levels of compromise, the reconstruction error (via KDE) is low but the 
number of clusters (via KDE) is very high and vice versa. Figure [5] shows the 
behaviour of these reconstruction error and number of clusters generated as 
the level of compromise increases. High reconstruction error does not necessarily 
mean that the representation of the image is bad. It only suggests the granularity 
of reconstruction obtained. Thus the reconstruction of the image can yield finer 
details at low level of compromise and point to segmentations at high level of 
compromise. Regularization over the level of compromise and the number of 
clusters would lead to a reconstruction which has low reconstruction error as 
well as adequate number of decompositions that represent an image properly. 

There are a few points that need to be remembered when applying such 
an online learning paradigm. The reconstructed results come near to original 
image only at a level of imposed compromise. As the size of dataset or the 
image increases, the time consumed and the number of computations involved for 
processing also increases. To start with, the learner would perform well in clean 
images than on noisy images. Adaptations need to be made for processing noisy 
images or the pre-processing would be a necessary step before application of such 
an algorithm. Other inequalities can also be taken into account for multivariate 
information online. It would be tough to compare the algorithm with other 
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powerful clustering algorithms as the proposed work presents a weak learner and 
provides a general solution with no tight bounds on the quality of clustering. 

Nevertheless, the current work contributes to estimation of cluster number 
in an unsupervised paradigm using transductive-inductive learning strategy. It 
can be said that for a fixed Chebyshev parameter, in a bootstrapped sequence 
sampling environment without replacement, the unsupervised learner converges 
to a finite error rate along with the a finite number of clusters. The result in 
terms of clustering and the error rates may not be the most optimal (where the 
meaning depends on the goal of optimization), but it does give an affirmative 
clue that image decomposition is robust and convergent. 

4 Conclusion 

A simple transductive-inductive learning strategy for unsupervised learning paradigm 
is presented with the usage of multivariate Chebyshev inequality. Theoretical 
proofs of convergence in number of clusters for a particular level of compromise 
show (1) stability of result over a sequence and (2) robustness of probabilistically 
estimated approximation of cluster number over a random sample of sequences, 
representing the same multidimensional data. Lastly, upper bounds generated 
on the number of clusters point to a limited search space. 
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